Where possible, the notation dest = op (dest, src) applies to both floats residing in an MMX register; the operation is executed in parallel on them. The bold instructions were introduced on Athlon.
| SIMD-FP solution | |
| pi2fd | dest = float(dwordsrc) |
| pf2id | dest = dword(floatsrc) |
| pi2fw | dest[63..32] = float(wordsrc[47..32]) dest[32..0] = float(wordsrc[15..0]) |
| pf2iw | dest[63..48] = 0 dest[47..32] = word(floatsrc[63..32]) dest[31..16] = 0 dest[15..0] = word(floatsrc[31..0]) |
| pfacc | dest.hi = src.hi + src.lo, dest.lo = dest.hi + dest.lo |
| pfnacc | dest.hi = src.hi - src.lo, dest.lo = dest.hi - dest.lo |
| pfpnacc | dest.hi = src.hi + src.lo, dest.lo = dest.hi - dest.lo |
| pfadd | dest = dest + src |
| pfsub | dest = dest - src |
| pfsubr | dest = src - dest |
| pfcmpeq | dest = (dest == src) ? 0xFFFFFFFF : 0 |
| pfcmpge | dest = (dest >= src) ? 0xFFFFFFFF : 0 |
| pfcmpgt | dest = (dest > src) ? 0xFFFFFFFF : 0 |
| pfmin | dest = min (dest, src) |
| pfmax | dest = max (dest, src) |
| pfmul | dest = dest * src |
| pfrcp | dest.hi = dest.lo = approx15(1/src.lo) |
| pfsqrt | dest.hi = dest.lo = approx15(1/sqrt(src.lo)) |
| pfrcpit1 | first iteration of reciprocal approximation |
| pfrcpit2 | second it. of reciprocal and recip. sqrt approx. |
| pfrsqit1 | first it. of recip. sqrt approx. |
| Extensions to MMX | |
| pavgusb | dest = average (dest, src) (on unsigned bytes) |
| pmulhrw | used instead of pmulhw, for fixed point math |
| pswapd | dest.hi = src.lo, dest.lo = src.hi |
| femms | fast empty MMX state |
| prefetch | prefetch data to L1 cache |
| prefetchw | on current processors does not differ from prefetch |